Skip to content

RANGER-5655: Dynamic unified ingestor registry for audit partition routing and service allowlists#1032

Open
ramackri wants to merge 10 commits into
apache:masterfrom
ramackri:RANGER-5655-patch
Open

RANGER-5655: Dynamic unified ingestor registry for audit partition routing and service allowlists#1032
ramackri wants to merge 10 commits into
apache:masterfrom
ramackri:RANGER-5655-patch

Conversation

@ramackri

@ramackri ramackri commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Implements RANGER-5655: a dynamic unified ingestor registry for Ranger audit-ingestor so operators can change Kafka partition routing and per-repo service allowlists at runtime — without restarting ingestor pods.

The registry is a versioned JSON document in Kafka topic ranger_audit_partition_plan (1 partition, compacted). All ingestor replicas converge via PartitionPlanWatcher; AuditPartitioner routes on the hot path from in-memory state only.

Feature flag (default off): ranger.audit.ingestor.kafka.partition.plan.dynamic.enabled=false

Problem

Job Static behavior today Pain
Service allowlist ranger.audit.ingestor.service.*.allowed.users in site XML at startup Onboard repo / change allowlist → XML edit + restart all ingestor pods
Partition routing kafka.configured.plugins + per-plugin overrides at startup Promote hot plugin or grow partitions → restart; contiguous ranges can reshuffle later plugins

Solution (this PR update)

Simplified REST control plane — three endpoints only:

Method Path Purpose
GET /api/audit/partition-plan Read current plan (plugins, buffer, services, version)
POST /api/audit/partition-plan/plugins Onboard plugin: dedicated partitions + mandatory non-empty services map
PATCH /api/audit/partition-plan/plugins/{pluginId} Update onboarded plugin: scale and/or addServices / updateServices / removeServices

Removed (consolidated above): PATCH /api/audit/partition-plan, POST /api/audit/partition-plan/services, separate promote-only / scale-only flows.

New request models: OnboardPlugin, UpdatePlugin. Service entries stored with optional pluginId for repo→plugin ownership.


Code changes in this commit (REST simplification slice)

Area Change
AuditREST.java Three partition-plan endpoints only
PartitionPlanService.java onboardPlugin(), updatePlugin()
PartitionPlanAllocator.java Onboard/update with service allowlist mutations
PartitionPlanRequestValidator.java Mandatory services on POST onboard
OnboardPlugin.java, UpdatePlugin.java New REST DTOs
ServiceAllowlistEntry.java Optional pluginId for ownership
Unit tests PartitionPlanRequestValidatorTest + mutation/allocator updates (94 partition-plan tests)

How was this patch tested?

Unit tests + quality gates

mvn verify -pl audit-server/audit-ingestor -Drat.skip=true \
  -Dtest='PartitionPlan*Test,ServiceAllowlist*Test,AuthToLocalRuleComposerTest'
Gate Result
Partition-plan + allowlist tests 94/94 pass
Checkstyle Pass
PMD Pass

Focused run:

mvn test -pl audit-server/audit-ingestor \
  -Dtest='PartitionPlan*Test,ServiceAllowlist*Test' -Drat.skip=true

Manual testing (local Docker audit lab)

Manual validation used a local Docker Compose audit environment that mirrors a production-style Ranger audit deployment: Kerberos (KDC + plugin keytabs), Kafka with both the audit data topic (ranger_audits) and the compacted registry topic (ranger_audit_partition_plan), a running audit-ingestor instance on port 7081, Solr (with audit dispatcher), Postgres-backed Ranger Admin, and real plugin containers for HDFS and Hive. All partition-plan REST calls used SPNEGO (Kerberos) as the ingestor HTTP service principal; plugin audit posts used each plugin’s own keytab.

The ingestor was rebuilt with this branch’s code (including mandatory services validation on onboard) before running the scenarios below.


1. Environment readiness

Before exercising the new API, the lab was brought to a healthy state: ingestor health endpoint returned 200, Kafka was reachable, the plan watcher was active after enabling dynamic mode, and GET /api/audit/partition-plan returned a coherent plan JSON (version, plugins, buffer, services, topicPartitionCount matching the live ranger_audits partition count).


2. Static mode unchanged (feature flag off)

With ranger.audit.ingestor.kafka.partition.plan.dynamic.enabled=false (default):

  • GET /api/audit/partition-plan returned 503 — partition-plan admin API correctly disabled.
  • GET /api/audit/health still returned 200.
  • Normal plugin audit delivery (HDFS smoke, Solr indexing) continued to work.

This confirms existing deployments are unaffected when the flag stays off.


3. Enabling dynamic mode and reading the registry

Dynamic mode was turned on (dynamic.enabled=true) with a fresh or reset plan topic where appropriate. After ingestor restart:

  • Kafka showed ranger_audit_partition_plan with one partition and compacted cleanup policy.
  • GET /api/audit/partition-plan returned 200 with version ≥ 1, populated services from XML bootstrap, and topicPartitionCount equal to kafka-topics --describe ranger_audits.
  • Ingestor logs confirmed PartitionPlanWatcher started and the partitioner loaded the in-memory plan.

4. Simplified REST API — onboard, validation, scale

All mutations used expectedVersion from the preceding GET.

Negative validation (new behavior)

  • POST /api/audit/partition-plan/plugins with pluginId, partitionCount, and expectedVersion but omitting services400 Bad Request with message indicating services are required. This was the primary regression guard for the API consolidation.

Successful onboard

  • Onboarded a buffer-only plugin (e.g. storm or ambari) in a single call with a non-empty services map (repo → allowedUsers). Response 200; plan version incremented; plugin appeared under plugins with dedicated partition IDs taken from the buffer (or tail-grown when needed); corresponding repo entries appeared under services.

Multi-repo onboard in one version bump

  • Onboarded trino with two repos in one POST (dev_trino and dev_trino2, each with its own allowedUsers) → 200; both repos present in services with pluginId ownership tagged to trino.

Optimistic locking

  • Repeated onboard with a stale expectedVersion409 Conflict with current plan in the response body.
  • Attempted to onboard hdfs again when it already had dedicated partitions → 400 (conflicting state).

Scale after onboard

  • PATCH /api/audit/partition-plan/plugins/{pluginId} with additionalPartitions200; tail partition IDs appended append-only; ranger_audits grown via AdminClient when required; subsequent GET showed stable version and layout.

Idempotency check

  • After mutations, GET /api/audit/partition-plan without restart showed the same version and layout as the last successful write.

5. End-to-end plugin flows (allowlist + routing)

These tests prove the full path: registry onboard → allowlist enforcement → Kafka produce → correct partition assignment.

HDFS

  1. Onboarded plugin hdfs with repo dev_hdfs and allowlist hdfs,nn via POST .../plugins (mandatory services).
  2. From the Hadoop container, posted a test audit batch to POST /api/audit/access?serviceName=dev_hdfs&appId=hdfs using the hdfs Kerberos principal → 200; authenticatedUser mapped to short name hdfs.
  3. Consumed the corresponding record from ranger_audits and verified the partition number was in the hdfs assignment list from the plan (not the buffer pool).
  4. Optionally scaled hdfs with PATCH .../plugins/hdfs and repeated the access + Kafka partition check — routing still respected the updated plan.

Hive

  1. Onboarded hiveServer2 with repo dev_hive and allowlist ["hive"] in the same onboard POST.
  2. From the Hive container, posted audits with the hive principal → 200.
  3. Kafka record landed on a partition in the hiveServer2 dedicated set (e.g. partitions [7, 8] after prior lab mutations).
  4. Confirmed ingestor logs showed auth_to_local rules recomposed after the onboard (allowlist union updated).

HDFS already onboarded path

  • Where hdfs was already present in the plan from an earlier run, the lab skipped re-onboard and verified allowlist + routing still held: access accepted, partition ∈ plan.

6. Allowlist behavior (authorization layer)

Separate from partition routing:

  • A principal in services[repo].allowedUsers (after auth_to_local) → 200 on /access.
  • After tightening allowlist via PATCH .../plugins/{pluginId} with updateServices to remove the principal → 403 on the same POST.
  • Restoring the allowlist → 200 again.
  • Posting audits claiming a different repo than the principal is allowed for → 403 (cross-repo denial).

This confirms the unified services map in the registry drives authorization without XML restart.


7. What did not change

Area Observation
Audit spool / recovery Kafka produce failures still spool to per-pod local files; retry path unchanged. Dynamic mode does not alter recovery semantics.
Solr / HDFS dispatchers No reconfiguration needed; consumers rebalance when ranger_audits grows.
Static mode No plan topic usage; no watcher; partition-plan REST returns 503.

8. Summary of manual test outcomes

Scenario Result
Static mode regression Pass
Dynamic enable + plan bootstrap Pass
Onboard without services → 400 Pass
Onboard with mandatory services Pass
Multi-repo single onboard Pass
Stale version → 409 Pass
Duplicate plugin → 400 Pass
PATCH scale (append-only) Pass
HDFS: access + Kafka partition ∈ plan Pass
Hive: onboard + access + Kafka partition ∈ plan Pass
Allowlist toggle via PATCH update Pass (where exercised)

…gestor: runtime Kafka partition routing and per-repo service allowlists via compacted topic + REST, without ingestor restarts. Feature flag default off.

Co-authored-by: Cursor <cursoragent@cursor.com>
@ramackri ramackri requested review from mneethiraj and rameeshm June 23, 2026 13:43
ramk and others added 3 commits June 23, 2026 19:18
Use hdfs-only allowlist for dev_hdfs, remove unused dev_solr allowlist
entry, fix buffer partition example math, and add detailed manual test
documentation for PR apache#1032.

Co-authored-by: Cursor <cursoragent@cursor.com>
Keep dev_solr service allowlist property (remove only the stray blank line
the feature commit added). Retain hdfs-only dev_hdfs allowlist and buffer
partition example fix. Remove dev-support/RANGER-5655-PR-TEMPLATE.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
Correct import order, remove unused import, use static requireNonNull,
drop duplicate test import, and align PartitionPlan imports with
checkstyle rules reported on PR apache#1032.

Co-authored-by: Cursor <cursoragent@cursor.com>
ramk and others added 5 commits June 23, 2026 21:20
…n layout.

Ship the standard 14-plugin lab list in ranger-audit-ingestor-site.xml with
dynamic partition plan disabled by default; update buffer partition example
to 14 × 3 + 9 = 51 total.

Co-authored-by: Cursor <cursoragent@cursor.com>
Consolidate partition-plan mutations into three endpoints: GET plan,
POST onboard plugin (mandatory non-empty services map), and PATCH update
plugin. Remove PATCH /partition-plan and POST /services. Add validator
and E2E coverage for mandatory services on onboard.

Co-authored-by: Cursor <cursoragent@cursor.com>
Keep REST simplification to Java sources and unit tests only.

Co-authored-by: Cursor <cursoragent@cursor.com>
Drop unused PromotePlugin, OnboardService, PluginScale, and
PartitionPlanReplacement after REST API consolidation. Cache partition-plan
admin users and dynamic.enabled flag in PartitionPlanService constructor.
Refactor partition-plan helpers and AuditREST partition-plan paths to
match Ranger review style with one return statement per method.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements the “dynamic unified ingestor registry” for audit-ingestor by introducing a Kafka-compacted, versioned partition-plan document (including per-repo service allowlists) and a simplified REST control plane to onboard/update plugins at runtime without restarting ingestor pods.

Changes:

  • Adds Kafka-backed partition-plan registry plumbing (bootstrap, watcher, validator, update applier, registry client) with in-memory hot-path state via PartitionPlanHolder.
  • Simplifies/introduces partition-plan REST endpoints in AuditREST for GET /partition-plan, POST /partition-plan/plugins, and PATCH /partition-plan/plugins/{pluginId} with request validation.
  • Introduces dynamic-mode allowlist behavior (registry-first with XML fallback) and composes auth_to_local rules from the global allowlist union when dynamic mode is enabled.

Reviewed changes

Copilot reviewed 50 out of 50 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
audit-server/audit-ingestor/src/test/java/org/apache/ranger/audit/producer/kafka/partition/ServiceAllowlistResolverTest.java Unit tests for registry-first allowlist resolution with XML fallback.
audit-server/audit-ingestor/src/test/java/org/apache/ranger/audit/producer/kafka/partition/ServiceAllowlistBootstrapTest.java Unit tests for loading/merging allowlists from XML properties into plan services.
audit-server/audit-ingestor/src/test/java/org/apache/ranger/audit/producer/kafka/partition/PartitionPlanValidatorTest.java Unit tests for plan shape validation and append-only constraints.
audit-server/audit-ingestor/src/test/java/org/apache/ranger/audit/producer/kafka/partition/PartitionPlanUpdateApplierTest.java Unit tests for applying compacted Kafka plan records into memory.
audit-server/audit-ingestor/src/test/java/org/apache/ranger/audit/producer/kafka/partition/PartitionPlanServiceTest.java Unit tests for dynamic-enabled flag and in-memory plan reads.
audit-server/audit-ingestor/src/test/java/org/apache/ranger/audit/producer/kafka/partition/PartitionPlanServiceMutationTest.java Unit tests for onboard/update flows, optimistic locking, and topic-grow failure handling.
audit-server/audit-ingestor/src/test/java/org/apache/ranger/audit/producer/kafka/partition/PartitionPlanRequestValidatorTest.java Unit tests for REST request model validation (mandatory services on onboard).
audit-server/audit-ingestor/src/test/java/org/apache/ranger/audit/producer/kafka/partition/PartitionPlanKafkaConfigTest.java Unit tests for plan-topic config resolution and dynamic flag parsing.
audit-server/audit-ingestor/src/test/java/org/apache/ranger/audit/producer/kafka/partition/PartitionPlanHolderTest.java Unit tests for plan holder allowlist access and install validation.
audit-server/audit-ingestor/src/test/java/org/apache/ranger/audit/producer/kafka/partition/PartitionPlanBootstrapTest.java Unit tests for bootstrap plan layout and overrides.
audit-server/audit-ingestor/src/test/java/org/apache/ranger/audit/producer/kafka/partition/PartitionPlanBootstrapSupportTest.java Unit tests for empty-registry bootstrap and peer-publish adoption.
audit-server/audit-ingestor/src/test/java/org/apache/ranger/audit/producer/kafka/partition/PartitionPlanAllocatorTest.java Unit tests for onboard/update allocation behavior and service ownership rules.
audit-server/audit-ingestor/src/test/java/org/apache/ranger/audit/producer/kafka/partition/model/PartitionPlanJsonTest.java Unit tests for JSON round-trip and semantic equality.
audit-server/audit-ingestor/src/test/java/org/apache/ranger/audit/producer/kafka/partition/AuthToLocalRuleComposerTest.java Unit tests for composing/applying auth_to_local rules based on allowlists and dynamic mode.
audit-server/audit-ingestor/src/test/java/org/apache/ranger/audit/producer/kafka/AuditPartitionerDynamicTest.java Unit tests for dynamic partition routing behavior and concurrency.
audit-server/audit-ingestor/src/main/resources/conf/ranger-audit-ingestor-site.xml Updates config docs and adds dynamic partition-plan properties.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/server/AuditServerConfig.java Allows overriding ingestor config path via -Daudit.config.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/rest/AuditREST.java Adds simplified partition-plan REST endpoints and integrates dynamic allowlist enforcement.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/ServiceAllowlistResolver.java Implements registry-first per-repo allowlist authorization with XML fallback.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/ServiceAllowlistBootstrap.java Loads allowlists from XML properties and merges into plans when services are missing.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/PrimaryCatalogRule.java Holds parsed auth_to_local catalog rules and mapped short names.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/PartitionPlanWatcher.java Background watcher that bootstraps and refreshes in-memory plan from Kafka.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/PartitionPlanValidator.java Validates plan structure, services, and append-only update semantics.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/PartitionPlanUpdateApplier.java Applies newer plan versions from compacted Kafka records into PartitionPlanHolder.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/PartitionPlanService.java Service layer for REST reads/mutations with optimistic locking and topic growth.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/PartitionPlanRequestValidator.java Validates OnboardPlugin / UpdatePlugin request bodies.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/PartitionPlanRegistryFactory.java Factory for opening Kafka-backed plan registries for REST mutations.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/PartitionPlanRegistry.java Interface for durable partition-plan storage (Kafka compacted topic).
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/PartitionPlanKafkaConfig.java Centralizes partition-plan Kafka config resolution and security settings.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/PartitionPlanHolder.java Atomic in-memory plan holder used by hot-path routing and allowlist resolution.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/PartitionPlanBootstrapConfig.java Represents bootstrap inputs derived from legacy producer/XML config.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/PartitionPlanBootstrap.java Bootstraps v1 plan from legacy config and seeds Kafka registry when empty.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/model/UpdatePlugin.java REST DTO for plugin updates (scale + allowlist mutations).
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/model/ServiceAllowlistEntry.java Plan DTO for per-repo allowlists with optional plugin ownership.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/model/PluginPartitionAssignment.java DTO for explicit/contiguous partition assignments.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/model/PartitionPlan.java Versioned plan DTO with JSON serialization/deserialization + validation.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/model/OnboardPlugin.java REST DTO for onboarding a plugin (mandatory services).
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/KafkaPartitionPlanRegistry.java Kafka implementation of the compacted plan registry.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/KafkaAuditTopicPartitionGrower.java Grows audit topic partitions before plans reference new tail IDs.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/exception/PartitionPlanException.java Base exception for plan validation and mutation errors.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/exception/PartitionPlanConflictException.java Optimistic-lock conflict exception carrying the current plan (HTTP 409).
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/constants/PartitionPlanConstants.java Constants for initial plan version and consumer group ids.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/AuthToLocalRuleComposer.java Composes and applies auth_to_local rules based on allowlist union in dynamic mode.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/partition/AuthToLocalRuleCatalog.java Parses the auth_to_local catalog and composes a reduced active ruleset.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/AuditPartitioner.java Adds dynamic plan routing path using PartitionPlanHolder.
audit-server/audit-ingestor/src/main/java/org/apache/ranger/audit/producer/kafka/AuditMessageQueue.java Starts/stops PartitionPlanWatcher when dynamic mode is enabled.
audit-server/audit-common/src/test/java/org/apache/ranger/audit/utils/AuditMessageQueueUtilsTest.java Adds tests for building Kafka AdminClient config.
audit-server/audit-common/src/main/java/org/apache/ranger/audit/utils/AuditMessageQueueUtils.java Adds plan-topic creation, topic-exists probing, admin config helper, and topic grow helper.
audit-server/audit-common/src/main/java/org/apache/ranger/audit/server/AuditServerConstants.java Adds dynamic partition-plan constants and changes default configured plugins to empty.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… code

Use Math.floorMod for buffer hash routing, align PartitionPlanHolder
Javadoc with validator rules, and update configured.plugins REST doc.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants